-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify merging enrich output #107018
Simplify merging enrich output #107018
Conversation
32c385d
to
5a012b6
Compare
if (cell.length > 1) { | ||
builder.beginPositionEntry(); | ||
} | ||
// TODO: sort and dedup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for a follow-up.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
private final int[] mergingChannels; | ||
private final ElementType[] mergingTypes; | ||
private PositionBuilder positionBuilder = null; | ||
private final EnrichResultBuilder[] builders; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class becomes much simpler.
5a012b6
to
0794d05
Compare
@nik9000 I have benchmarked this change, along with two incoming improvements: one for reading values in batch, and another for leveraging the bytesref ordinal block. With these three changes, enrich in our benchmark is 11 times faster: from 330ms to 30ms.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that is nice!
@nik9000 Thanks for reviewing :) |
Landing elastic#107018 broke running ESQL unit tests in IntelliJ. It has *something* to do with turning on the stringtemplate plugin in the esql project but I don't really know what. After that PR we'd often get errors about trying to regenerate evaluators twice. I dunno. This fixes it. But I don't really know why. The way this fixes it is by making the `esql` project more like the `copmute` project. It makes sense that that would help - they both have the same code generation configuration. Anyway, the operative change is landing the generated files in the same place as the `compute` project. Thus all of the file moves. Again, I have no idea why this works. It's build black magic and I just shook it until it worked. Most of the credit goes to git-bisect for finding the commit that broke this.
Landing #107018 broke running ESQL unit tests in IntelliJ. It has *something* to do with turning on the stringtemplate plugin in the esql project but I don't really know what. After that PR we'd often get errors about trying to regenerate evaluators twice. I dunno. This fixes it. But I don't really know why. The way this fixes it is by making the `esql` project more like the `copmute` project. It makes sense that that would help - they both have the same code generation configuration. Anyway, the operative change is landing the generated files in the same place as the `compute` project. Thus all of the file moves. Again, I have no idea why this works. It's build black magic and I just shook it until it worked. Most of the credit goes to git-bisect for finding the commit that broke this.
The merge logic in MergePositionsOperator is excessively complex and lacks flexibility. It relies on the source operator emitting pages with ascending positions. Additionally, this merge logic introduced an unusual method,
appendAllValuesToCurrentPosition
, to the Block.Builder. We should replace this with a simpler and more flexible approach. This PR uses a mechanism similar to the grouping aggregation. In fact, it is very close to the values aggregation. Initially, I considered using the GroupingState from ValuesAggregator. However, unlike in the values aggregation, we don't expect many multi-values in enrich. Hence, I introduced the new EnrichResultBuilders instead.